Clustering of Large Event Sets Using Delaunay Tessellations and K-mediod Optimization
نویسندگان
چکیده
This paper introduces a new fully-automated method for identifying clusters in large event sets. The method uses a Delaunay tessellation to determine the clusters and initial membership, then applies an iterative K-medoid optimization to refine membership in the clusters until stability is achieved. The method is robust and computationally efficient, with performance improvement over standard K-medoid optimization from O(n) to O(n log n), which is achieved by making use of the Delaunay tessellation neighbor connectivity information. It produces clusters that meet three key criteria: 1) for each cluster, each event is closer to the representative event for that cluster (the medoid) than to the representative events for nearby clusters, 2) for each member event in a cluster, there is a closest neighbor that is no further away than Dmax, 3) no event in a cluster is further than Dmax from any other event in the cluster. Dmax is a user-defined parameter that can be used to control the number and size of clusters. The basic algorithm consists of three steps. First, initial clusters are identified by forming a Delaunay tessellation for the entire set, then removing all edges longer than Dmax. Second, the initial clusters are sub-divided using a medial-axis subdivision algorithm until no cluster has a maximum event-to-event span greater than Dmax.. Third, given these groups, membership in the groups and K-medoid representatives for each are optimized in a hill-climbing iterative process. In most cases, this sequence produces excellent results, but we have found rare cases where the method can form poor clusters or event-to-cluster assignments. Hence we have added an additional clean up step that can break up clusters with a main body of members and a few outliers to merge the main body with a nearby cluster (if one is available), and that can re-assign an outlier member to another cluster if that cluster has nearby events to the outlier. The technique is demonstrated with a large set of ISC catalog events and results for various regions are examined. The number of clusters and cluster membership change with different values of Dmax are shown and results with and without the final clean up step are compared. 29th Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies
منابع مشابه
Solving Data Clustering Problems using Chaos Embedded Cat Swarm Optimization
In this paper, a new method is proposed for solving the data clustering problem using Cat Swarm Optimization (CSO) algorithm based on chaotic behavior. The problem of data clustering is an important section in the field of the data mining, which has always been noted by researchers and experts in data mining for its numerous applications in solving real-world problems. The CSO algorithm is one ...
متن کاملSolving Data Clustering Problems using Chaos Embedded Cat Swarm Optimization
In this paper, a new method is proposed for solving the data clustering problem using Cat Swarm Optimization (CSO) algorithm based on chaotic behavior. The problem of data clustering is an important section in the field of the data mining, which has always been noted by researchers and experts in data mining for its numerous applications in solving real-world problems. The CSO algorithm is one ...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کامل12 Discrete Aspects of Stochastic Geometry
Stochastic geometry studies randomly generated geometric objects. The present chapter deals with some discrete aspects of stochastic geometry. We describe work that has been done on familiar objects of discrete geometry, like finite point sets, their convex hulls, discrete point sets, arrangements of flats, tessellations of space, under various assumptions of randomness. Most of the results to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010